The name "Andys Binary Folding Editor" is currently a lie. At the present time, this program only allows structured browsing, no actual editing per-se.
This program is designed to take in a set of binary files, and with the aid of an initialisation file, decode and display the structures within them. BE is particularly suited to displaying non-variable length structures within the files.
This makes examination of known file types easy, and allows rapid and reliable navigation of memory dumps.
usage: be [-w width] [-h height] [-i inifile] {-I incpath} {-D symbol} [-d defn] [-a addr] [-y symfile] {binfile[@addr]} flags: -w width screen width -h height screen height -i inifile override default initialisation file -I incpath append include path(s) for use by inifile -D symbol pre-$define symbol(s) for use by inifile -d defn initial definition to use (default: main) -a addr initial address to use (default: 0) -y symfile input symbol table file binfile@addr binary file(s) (with optional address, default: 0)
The -w
and -h
arguments can be used to try
to override the current screen size.
This doesn't work on UNIX, but does on OS/2.
The -i
flag overrides the default initialisation file.
The -I
flag affects the operation of the
include command in the
initialisation file.
The -D
flag allows the definition of symbols which may be
accessed via the $ifdef
and similar directives in the
initialisation file.
The initial structure definition and address to decode may be overridden
with the -d
and -a
flags.
Normally BE starts by looking up the definition of a 'main' structure,
and decoding the data at address 0 as such.
A symbol table may be specified using the -y
flag.
Each line of the symbol table is of the form :-
symbolname 472484aa
Note that the address is in hex, and not 0x preceeded. This conveniently matches the symbol table layout generated by the ARM linker.
Multiple input binary files can be specified, and they should be loaded at non-overlapping address ranges.
Typical invokations of BE might be :-
be -y gizmo.sym gizmo.rom gizmo.ram@0x8000 be picture.bmp
One of the first thing BE does is to find and load the initialisation file, and this tells BE the layout of various file formats and the structures within them.
Under OS/2 or Windows, BE finds the initialisation file by searching
along the path for an .INI file with the same name.
Under UNIX, BE looks for ~/.berc
,
(or ~/.xxrc
if the be executable is renamed to xx
).
BE can be made to look elsewhere using the -i
command line option.
This initialisation file may contain C or C++ style comments.
Also, $define
, $ifdef
, $ifndef
,
$else
, $endif
and $error
are supported,
as a form of a pre-processing/conditional processing step.
The -D
command line option may be used to pre-$define such
conditional processing symbols.
If BE is running on OS/2, then OS2
is pre-$defined.
If running on Windows NT or Windows 95, then WIN32
is
pre-$defined.
If running on a type of UNIX, then UNIX
is pre-$defined.
If running specifically on AIX, then AIX
is pre-$defined.
Either BE
or LE
will be pre-$defined, depending
upon whether BE is running on a big-endian or little-endian machine.
These $defines allow you to write initialisation files with sensible
defaults, relevant for the current environment.
An
include directive is supported, and included files
will be searched for by looking in the current directory, then along an
internal include path, and finally along the PATH environment variable.
The internal include path is usually empty, but may be appended to by the
use of the -I
command line option.
The initialisation file contains commands to set the default data display attributes, structure definitions, and include statements.
As BE processes the initialisation file, it generates warnings (such as undefined symbol table symbol), and error messages into an internal buffer. If there are no errors, then this buffer is discarded. If there are errors, then all the warnings and errors are listed, and BE aborts.
Wherever the initialisation file calls for a number, the following variants may be used :-
number
0b1101
),
octal (eg: 0o15
),
decimal (eg: 13
)
or hex (eg: 0x0d
).
addr "symbolinthesymboltable"
0xffffffff
.
sizeof DEFN
Expressions may be constructed by use of brackets and also the following operators, with usual C language meanings, listed highest priority first :-
+
, -
, ~
, !
/
, *
, %
, &
+
, -
, |
, ^
eg: addr "tablebase" + 4 * sizeof RGB
Such numeric expressions can be used when BE prompts for a number.
When the program starts parsing the initialisation file, the default data
display attributes are
le unsigned hex nomul abs nonull nocode nolj noseg
.
To change this default setting, just include one or more of the following keywords in the file :-
be
- read multibyte values from memory in a big-endian fashion.
le
- read multibyte values from memory in a little-endian fashion.
signed
- when fetching numeric values sign extend them,
and when displaying numerically show '+signedvalue' or '-signedvalue'.
unsigned
- when fetching numeric values zero extend them,
and when displaying numerically show 'unsignedvalue'.
asc
- set display mode to ASCII.
ebc
- set display mode to EBCDIC.
bin
- set display mode to binary.
oct
- set display mode to octal.
dec
- set display mode to decimal.
hex
- set display mode to hex.
sym
- set display mode to symbolic.
ie: look up the value in the symbol table, and if found, display
symbol+hexoffset, else display value in hex.
null
- allow following of 0 pointers.
nonull
- disallow following of 0 pointers.
seg
- cope with 16:16 segmented pointers.
noseg
- pointers are not segmented.
mul
- pointer values should be multiplied by the size of
the data type being pointed to.
nomul
- pointer values are given in regular byte addresses.
abs
- pointer values are absolute.
rel
- pointer values are to be considered relative to their
own addresses.
code
- specify that numeric value is actually a code address.
nocode
- specify that numeric value is not a code address.
lj
- perform ARM specific long-jump interpretation of code addresses.
nolj
- don't do long-jump interpretation.
These define a mapping between symbolic names and numeric values. A typical mapping definition in the initialisation file might be :-
map compression_type { "uncompressed" 1 "huffman" 2 "lzw" 3 }
If the numeric value on display matches the value given, then it can be converted to the textual description.
Bitfields may be acheived in the following fashion :-
map pending_events { "reconfiguration" 0x0001 : 0x0001 "flush_cache" 0x0002 : 0x0002 "restart_io" 0x0004 : 0x0004 }
The :
symbol introduces an additional mask.
The number to string conversion algorithm inside BE works like this :-
for each maplet in the map if ( value & maplet.mask ) == maplet.value then display the maplet.name if some unexplained bits left over then display the remaining value in hex
So, it is possible to have multiple field decodes from a single value :-
map twobitfields { "green" 0x0001 : 0x000f "blue" 0x0002 : 0x000f "red" 0x0003 : 0x000f "small" 0x0100 : 0x0f00 "large" 0x0200 : 0x0f00 }
The value 0x0243
would be converted to
red|large|0x40
.
Structures are a list of at OFFSET
clauses and field
definitions.
When the structure definition is processed, then the current-offset is
initialised to 0.
An at OFFSET
clause moves the current-offset to the specified
numeric value.
A
field definition defines a field which lives at the
current-offset into the structure.
After definition of the field, the current-offset is moved to the end of
the field, so that the next field will immediately follow it
(unless another at OFFSET
clause is used).
The size of the structure is the largest value that the current-offset
ever attains.
This is the value returned whenever sizeof DEFN
is used as a
number.
The at OFFSET
clause allows the same areas of a structure
to be displayed in more than one way, thus allowing the implementation of
unions.
Duplicate definitions of the same named structure are not allowed.
A structure definition may have zero or more fields and/or
at OFFSET
clauses.
n8 asc "initial" n8 buf 20 "surname" n16 be unsigned dec "age" 3 pet "pet names" 3 n16 be unsigned dec "pet costs" 2 n32 le unsigned hex ptr person "2 pointers to parents" 2 n32 ptr person null "2 pointers, null legal" person "a person" n32 sym code "__main" 1024 n32 unsigned dec "memory as 32 bit words" 9 n16 map errorcodes "results"
Each example is of the form :-
opt-count type opt-attrs name
The field describes count data items of the specified type,
count is restricted to being >= 1, and if it is > 1, then
the field is initially displayed by just showing its type
(eg: 10 n32 le unsigned hex "numbers"
).
When you select the field, you are presented with an element list, with
count lines, from which you can select the element you are interested in.
The type of the data is one of
n8
, n16
, n24
, n32
,
buf N
or DEFN
, where DEFN is the name of a
previously defined structure.
This type may be considered to be the way in which BE is told the size
of the data item concerned.
n8
, n16
, n24
and n32
mean 8, 16, 24 or 32 bit numeric data item.
buf N
means a buffer of N bytes.
The field has the default data display attributes, unless data display attribute keywords (as defined above) are included in the field definition.
In addition to the data display attribute keywords given above is the
map MAP
attribute which means display the numeric field by
looking up a textual equivelent of the numeric value using the
mapping which must have previously been defined.
The ptr DEFN
attribute says that the numeric value is in fact
a pointer to a structure of type DEFN.
DEFN need not be defined yet in the initialisation file.
The mul
/nomul
attribute described above
specifies whether to multiply the pointer value by the size of the data item
being pointed to.
The null
/nonull
attribute described above specifies
whether this pointer may be followed if the numeric value is 0.
The keyword add BASE
may be used.
Also, the rel
/abs
attribute described above specifies
whether to add the address of the pointer itself to the numeric value.
By using combinations of the pointer keywords, various effects may be
acheived :-
n32 ptr DEFN abs
n32 ptr DEFN add 0x40000 abs
n32 ptr DEFN mul add addr "table" abs
n32 ptr DEFN rel
n32 ptr DEFN add 8 rel
n32 le ptr DEFN abs seg
The procedure for following pointers is :-
nonull
and pointer is 0, then don't follow the pointer.
mul
, then multiply the pointer value by the size of
the item being pointed to.
add BASE
, then add BASE to the pointer value.
rel
, then add the address of the pointer itself.
seg
, then mangle pointer address to account for
the 16:16 segmented mode of x86 processors.
The seg
keyword works by taking the top 16 bits of the
pointer value as the segment, the bottom as the offset, and producing
a new pointer value which is segment*16+offset.
This feature may be of use for decoding large memory model program dumps
which have been running on x86 processors running in real mode, or a 16:16
protected mode with a linear selector mapping.
Anyone with a sensible file format to decode, or a dump taken from the
memory space of a processor of a sensible architecture, can ignore this
feature.
The keyword open
may be given and this has the effect
of increasing the level of detail that is initially displayed.
See the description of the level of detail of display feature later
in this document.
This feature has its problems (bugs), but can be used to ensure
that small arrays and short structures are displayed in full without the
user having to manually increase the level of detail by hand.
Finally the name of the field must be given.
The initialisation file can contain the following, as long as it is outside of any other definition :-
include "anotherfile.ini"
Here is a snippet from a real initialisation file :-
le unsigned hex abs // set defaults, just to be sure lj // allow ARM specific symbolic lookup of code addresses map DE_ { "DP_Pending" -1 "DS_Success" 0 "DE_Failure" 1 } def DPB { n32 ptr DPB "DPB_Next " n32 sym code "DPB_Address" n8 map DC_ "DPB_Number " n8 "DPB_Flag2 " n8 map SY_ "DPB_Flag " n8 signed map DE_ "DPB_Dsb " n32 "DPB_Safety " } def NOP { DPB "NOP_Header" n8 "NOP_Spare1" n8 "NOP_Spare2" n8 "NOP_Spare3" n8 dec "NOP_Period" n32 dec "NOP_Value " CLK "NOP_Clock " } def main // the entire memory map { at addr "noptable" 100 NOP "noptable " at addr "currentdpb" n32 ptr DPB "currentdpb" }
The supplied initialisation file contains enough definitions to enable you to examine the contents of many image file formats.
These include Windows / OS/2 Bitmaps, Targa files, KIPS files, ZSoft PCX, M-Motion Video, TIFF, ILBM IFF, Compu$erve GIF, RiscOS sprite, IBM PSEG, and OS/2 resource files.
The definitions in the initialisation file are in no way complete, or intended to be a definitive statement of such files contents, but are merely intended to aid in the browsing of the contents of such files.
Limitations of BE make it awkward to decode certain data structures in some files, so the attitude taken is typically 'display as best you can', and where data may be of variable length 'display the first few bytes worth...'.
Although not displayed, the arrow keys, such as Up, Down, PgUp, PgDn, Home and End all work in the obvious ways, traversing the list on display. The Wordstar keys ^E, ^X, ^R, ^C, ^W and ^Z also work.
BE displays the non-obvious keys you may press on the 2nd line of its status area, at the top of the screen.
@X (ie: Alt+X), or q exits the program.
Esc exits the current screen back to the previous level.
f allows you to do a find over the list on display. This only searches as much as the user could see if he were to manually page up and down through the list. The find command is case sensitive. n can be used to repeat the last find.
i allows you to generate a display which only has lines which include a pattern you specify. For example, if you have an array of trace-point events, you can easily generate a list of just trace-points from one module. Similarly, x allows you generate a display which excludes lines which match the pattern. Esc exits back to the original display.
The keys A,O,I toggle the display of addresses, offsets and array indices.
The r key causes a refresh. BE re-fetches all the data on display. The R is a slightly more aggressive form of refresh. If an extension providing data to BE was caching data, this type of refresh causes it to drop its cache.
g/l is displayed if you are allowed to change the memory interpretation mode to big or little endian.
s/u is displayed if you are allowed to change the signed display mode to signed or unsigned.
A subset of the keys a/e/b/o/d/h/y/m may be displayed if you are allowed to change the viewing mode to ASCII, EBCDIC, binary, octal, hex, decimal, symbolic or via a mapping table.
+/- is displayed to indicate that the level of detail of display may be increased or decreased. Level 0 means display the data type only. Level 1 means display the first level of data. Levels 2 and above mean display additional levels of detail.
Increasing the level of display can make BE open up an array,
and enumerate the elements.
eg: 3 n32
to [123,123,456]
.
Increasing the level of display can also make BE open up a
definition, and display the fields.
eg: VAR
to {"name",123}
.
This is capable of opening up the datastructure pointed to by a pointer, providing the pointer may be fetched and followed.
Some examples :-
level 0 level 1 level 2 level 3 ------- ------- ------- ------- n32 7 3 n32 3 n32 [8,9,10] VAR VAR {"a",1} 2 VAR 2 VAR [VAR,VAR] [{"b",2},{"c",3}] n16 ptr VAR 22->VAR 22->{"d",4} 2 n8 ptr VAR 2 n8 ptr VAR [33->VAR,44->VAR] [33->{"e",5},44->"f",6}]
Enter is displayed if you can press enter to either show the contents of the sub-definition, or to follow a pointer and show the definition there. The Esc key brings you back to where you are now.
Pressing @ will cause BE to prompt for a structure definition name, and then an address. It will then decode the memory at the given address as if it were of the specified structure type.
The binary file arguments to BE are normally of the form :-
filename[@address]
This tells BE to load the file and whenever data at a memory address
from address
to address+filelength
is accessed,
to supply the data from the file.
However, it is possible to supply binary file arguments of the form :-
extension!args[@address]
Under OS/2, BE will ensure that BEextension.DLL is loaded.
This DLL should be on the LIBPATH and should contain certain entrypoints
which will be used by BE.
BE then passes the args
and address
to the
extension DLL, who does something of its own chosing with them.
The extension DLL can then supply data to BE on request.
Under Windows, provision for extension DLLs is also exists.
The DLL is located according to the algorithm used by the Win32
LoadLibrary
API.
One use of this might be the provision of an extension for handling files too massive to load into memory all at once. The extension could open a file handle and read bytes demanded by BE upon request. This extension could be provided in BEBIGFIL.DLL, and the user could type :-
be bigfil!verybigfile.dat
Another use might be in live-debug of adapter cards.
The extension would provide data bytes from the memory space of the
adapter. args
could be used to identify the slot the
adapter is in.
Yet another use, might be providing BE with access to physical or virtual or process specific linear address spaces, perhaps via the use of a device driver. Shared memory windows might give addressibility of datastructures in other programs.
Also, the surface of a disk or block device could be made accessible via an extension.
Perhaps bytes sent down a communications port could be made to appear as a stream of binary data.
The file bememext.h
documents the extension interface.
Currently extensions may only be built for the OS/2 or version of BE
using the IBM C-Set++ compiler, or the Win32 version of BE using MS Visual C++.
I anticipate learning about shared library support on the various different
types of UNIX, enabling similar tricks to be performed there.
BE can be found on the You get a selection of executables, and the one to pick depends upon which operating system you wish to run :-
Unfortunately I don't have continual access to all the platforms, so improvements in one version may not yet be reflected into the others.
Copying of this program is encouraged, as it is fully public domain. Caveat Emptor.